The relationship between mental health and education level has been studied extensively, with a number of theories emerging on how the two interact. Correlative trends have been observed in PhD students, possibly owing to increased stress (1.). While other studies have shown a significantly reduced incidence of mental illness in educated individuals, hypothesising a protective effect (2.).
This report aims to explore the relationship between antidepressant prescriptions and education level across Scotland, using Public Health Scotland prescribing data, and data from Scotland’s most recent census in 2022. The prescribing data will be taken from January to August 2025 and a monthly average calculated, minimising any recording of repeat prescriptions following the NHS’ prescribing recommendations indicating an optimal 28-day repeat prescription duration (3, 4.).
Prescribing data can be found here: https://www.opendata.nhs.scot/dataset/prescriptions-in-the-community
library(tidyverse)
library(stringr)
library(janitor)
library(here)
library(ggrepel)
library(readr)
library(dplyr)
library(gt)
library(plotly)
library(forcats)
library(scales)
files <- list.files(here("data","consecutive_data"), pattern = "csv", full.names = TRUE) # 8 months of prescribing data through 2025
months <- seq(as.Date("2025-01-01"), as.Date("2025-08-01"), by = "month")
all_prescribing <- map2_df(files, months, ~ read_csv(.x,
col_types = cols(DMDCode = col_character(), # ensure DMD codes are all characters, as some are read as numbers
.default = col_guess()), show_col_types = FALSE) %>%
mutate(month = .y)) %>%
clean_names() %>%
select(hbt, gp_practice, bnf_item_code, number_of_paid_items, month)
First, we need to find all the antidepressants being prescribed in Scotland. This is made possible by the inclusion of BNF item codes in the Public Health Scotland prescribing data set. By identifying the relevant codes for antidepressants we can filter the data set down to just the drugs we want to look at.
In order to identify the code for antidepressants, we can look at the BNF code information provided by the NHS Business Services Authority and filter the data set down to just antidepressants, then calculate average monthly prescribing numbers for each health board.
BNF information is available at: https://opendata.nhsbsa.net/dataset/bnf-code-information-current-year
BNF_info <- read_csv(here("data/bnf_code_current_202506_version_88.csv")) %>%
filter(BNF_SECTION == "Antidepressant drugs") # filter the data to show just the antidepressants
unique(BNF_info$BNF_SECTION_CODE) # 0403 is the code we use to focus our prescribing data
## [1] "0403"
antidepressants <- all_prescribing %>%
filter(str_starts(bnf_item_code, "0403"), hbt != "SB0806") # remove special health board as we filter for antidepressants
antidepressants_avg <- antidepressants %>%
group_by(hbt, month) %>%
summarise(total_items = sum(number_of_paid_items, na.rm = TRUE), .groups = "drop") %>% # summing the number of monthly antidepressant prescriptions for each health board
group_by(hbt) %>%
summarise(mean_items = mean(total_items, na.rm = TRUE), .groups = "drop") # prescription averages for each health board across the 8 months
Next, let’s add some names for the health boards to our data to allow for clearer visualisation down the line, and we can get an idea of the range of total prescriptions across Scotland. Using the flexible table builder on the Scottish census website, I have created a table with the numbers for the highest level of education (columns) achieved by healthboard (rows). We can also join this with the data for antidepressants, using the healthboard names.
Health board names available at: https://www.opendata.nhs.scot/dataset/geography-codes-and-labels/resource/652ff726-e676-4a20-abda-435b98dd7bdc Census table builder is available at: https://www.scotlandscensus.gov.uk/search-the-census#/
hb_names <- read_csv("https://www.opendata.nhs.scot/dataset/9f942fdb-e59e-44f5-b534-d6e17229cc7b/resource/652ff726-e676-4a20-abda-435b98dd7bdc/download/hb14_hb19.csv")
data_antidepressants <- hb_names %>%
inner_join(antidepressants_avg, by = c("HB" = "hbt")) %>%
clean_names() %>%
select(!c(hb_date_enacted, hb_date_archived, country)) %>%
mutate(hb_name = gsub("^NHS ", "", hb_name)) # removes NHS prefix for each health board to match the naming conventions in the census data
# we have to load the census data as raw text to extract just the information we need
raw_lines <- read_lines(here("data/education_level_healthboard.csv"))
info <- raw_lines[10:25] # keep only the headers and the data (lines 10–25), also excluding the extra total row we don't need
info <- info[-2] # remove the extra header line
census_data22 <- read_csv(
I(info), # feed character vector as a connection
col_names = TRUE, show_col_types = FALSE) %>%
clean_names() %>%
rename(hb = highest_level_of_qualification,
NA_or_under_16 = not_applicable_aged_less_than_16,
none = no_qualifications,
lower_school = lower_school_qualifications,
upper_school = upper_school_qualifications,
further_education_sub_degree = further_education_and_sub_degree_higher_education_qualifications_incl_hnc_hn_ds,
degree = degree_level_qualifications_or_above_education_qualifications_not_already_mentioned_including_foreign_qualifications,
apprenticeship = apprenticeship_qualifications) %>%
remove_empty("cols")
education_antidepressants <- data_antidepressants %>%
full_join(census_data22, by = c("hb_name" = "hb"))
# We need to show our flat numbers as proportions to properly gauge the relative prescribing rate of antidepressants. Luckily, we have a "total" column from the census data that shows us the population of each healthboard, so we can create a new column with the antidepressant prescriptions per capita from it
education_antidepressants <- education_antidepressants %>%
mutate(apc = mean_items / total, degree_pct = degree/total*100) # total prescriptions per capita and percentage of degree holders in each healthboard
Now we have all the data ready to investigate the prescribing rates of the health boards.
education_antidepressants %>%
select(hb_name, mean_items, total, apc) %>%
arrange(desc(apc)) %>%
gt(
rowname_col = "hb_name"
) %>%
tab_stubhead(label = "Health board") %>%
cols_label(
mean_items = "Average monthly items",
total = "Population",
apc = "Prescriptions per capita"
) %>%
tab_spanner(
label = "Prescribing metrics",
columns = c(mean_items, apc)
) %>%
fmt_number(
columns = c(mean_items, apc),
decimals = 2
) %>%
fmt_number(
columns = total,
decimals = 0,
use_seps = TRUE
) %>%
grand_summary_rows(
columns = c(mean_items, apc),
fns = list(
"Mean" = ~ mean(.x, na.rm = TRUE)
)
) %>%
tab_source_note(
source_note = "Data: Public Health Scotland; Prescriptions in the Community"
) %>%
tab_style(
style = list(
cell_fill(color = "#e5e5e5"),
cell_text(weight = "bold")
),
locations = cells_column_labels(everything())
) %>%
tab_style(
style = cell_text(align = "left"),
locations = list(
cells_column_labels(everything()),
cells_body(columns = everything())
)) %>%
tab_header(
title = "Antidepressant Prescribing by Scottish Health Board",
subtitle = "Prescription data from January to August 2025"
) %>%
tab_options(
table.width = pct(80),
heading.title.font.size = 24,
heading.subtitle.font.size = 14,
column_labels.font.weight = "bold"
)
| Antidepressant Prescribing by Scottish Health Board | |||
| Prescription data from January to August 2025 | |||
| Health board |
Prescribing metrics
|
Population | |
|---|---|---|---|
| Average monthly items | Prescriptions per capita | ||
| Dumfries and Galloway | 24,197.25 | 0.17 | 145,895 |
| Ayrshire and Arran | 55,730.25 | 0.15 | 365,256 |
| Lanarkshire | 99,087.38 | 0.15 | 668,027 |
| Greater Glasgow and Clyde | 166,587.62 | 0.14 | 1,177,213 |
| Forth Valley | 41,738.50 | 0.14 | 302,784 |
| Western Isles | 3,587.50 | 0.14 | 26,140 |
| Fife | 49,578.62 | 0.13 | 371,781 |
| Borders | 15,569.50 | 0.13 | 116,821 |
| Shetland | 2,997.12 | 0.13 | 22,986 |
| Tayside | 51,971.12 | 0.13 | 413,992 |
| Highland | 37,482.75 | 0.12 | 321,321 |
| Orkney | 2,526.00 | 0.12 | 21,958 |
| Grampian | 62,793.38 | 0.11 | 581,040 |
| Lothian | 93,846.25 | 0.10 | 904,628 |
| Mean | 50549.52 | 0.1321014 | — |
| Data: Public Health Scotland; Prescriptions in the Community | |||
The table shows the average monthly antidepressant prescriptions per capita from January to August 2025 in all 14 Scottish health boards. Dumfries and Galloway is the clear leader in prescribing rate, with a relatively noticeable gap between it and Ayrshire and Arran following in second, while Lothian has the lowest rate of the lot, but not too far removed from the other boards towards the bottom like Grampian, Orkney and Highland. The total range of prescriptions per capita is noticeable, with the lowest at 0.1 and the highest at 0.17, it’s almost a 70% increase. This indicates meaningful geographical variation in antidepressant prescribing across Scotland, with many factors possibly contributing to the inequalities, such as: local policy, socioeconomic disparities, and baseline population characteristic differences.
Now, using the same census data, we should have a look at the education levels of each health board relative to each other, so we can get an idea of the landscape of this variable.
education_data <- education_antidepressants %>%
pivot_longer(
c(NA_or_under_16:degree),
names_to = "edu_lvl",
values_to = "edu_lvl_count"
) %>%
mutate(
edu_lvl_pct = edu_lvl_count / total, edu_lvl = factor(edu_lvl, levels = rev(c( # reverse order of census progression for better visualisation
"NA_or_under_16",
"none",
"lower_school",
"upper_school",
"apprenticeship",
"further_education_sub_degree",
"degree"
))
),
edu_lvl = fct_recode( # plotly doesn't adopt legend labels set in ggplot so have to label the levels properly here
edu_lvl,
"Under 16 / N.A." = "NA_or_under_16",
"No qualifications" = "none",
"Lower school" = "lower_school",
"Upper school" = "upper_school",
"Apprenticeship" = "apprenticeship",
"Further education (sub-degree)" = "further_education_sub_degree",
"Degree or above" = "degree"
)
)
graph1 <- education_data %>%
ggplot(aes(
x = edu_lvl_pct,
y = reorder(hb_name, degree_pct),
fill = edu_lvl,
text = paste(
"Education level:", edu_lvl,
"<br>Percentage:", percent(edu_lvl_pct, accuracy = 0.1)
)
)) +
geom_col(colour = "grey20", linewidth = 0.2) +
scale_x_continuous(labels = percent_format(accuracy = 1)) +
scale_fill_manual(
name = "Highest level of qualification",
values = c(
"Under 16 / N.A." = "#f2e5ff",
"No qualifications" = "#d8b7ff",
"Lower school" = "#be89ff",
"Upper school" = "#a35cff",
"Apprenticeship" = "#8930ff",
"Further education (sub-degree)" = "#7013e6",
"Degree or above" = "#4d0099"
)
) +
labs(
x = "Percentage of health board population",
y = "Health board",
title = "Education level distribution by Scottish health board",
) +
theme(legend.position = "right",
legend.title = element_text(size = 9),
legend.text = element_text(size = 8))
ggplotly(graph1, tooltip = "text")
…
We’ll do this by using the percentage of each health board’s population that are degree holders as calculated earlier, and running a linear regression to determine the correlation between the two variables.
graph2 <- ggplot(education_antidepressants, aes(x = degree_pct, y = apc, fill = hb_name)) +
geom_point(size = 3) +
geom_smooth(method = "lm", se = TRUE) +
labs(
x = "Degree holders (%)",
y = "Antidepressant prescriptions per capita",
title = "Association between education level and antidepressant prescribing by Scottish Health board",
subtitle = "Prescribing data from January to August 2025"
)
ggplotly(graph2)
plot_ly(education_antidepressants,
x = ~degree_pct,
y = ~apc,
color = ~hb_name,
type = "scatter",
mode = "markers") %>%
add_lines(y = fitted(lm(apc ~ degree_pct, data = education_antidepressants)))
#geom_text_repel(aes(label = hb_name), size = 3) +
education_antidepressants <- education_antidepressants %>%
mutate(pred = fitted(lm(apc ~ degree_pct, data = .))) %>%
arrange(degree_pct)
plot_ly(
education_antidepressants,
x = ~degree_pct,
y = ~apc,
type = "scatter",
mode = "markers",
color = ~hb_name
) %>%
add_lines(
x = ~degree_pct,
y = ~pred,
inherit = FALSE, # don't reuse marker aesthetics
name = "Line of best fit"
)
A surface level interpretation of this graph says that there is a strong negative correlation between the variables: health boards with a higher percentage of degree holders have a lower antidepressant prescribing rate. To take just the extremes, Dumfries and Galloway with the highest prescribing rate sits at the far left of the graph, among the bottom health boards in degree attainment, while Lothian has the lowest prescribing rate and is directly juxtaposed, with the highest degree attainment by a large margin. In isolation, this is a strong point backing the negative correlation.
However, the standard error is of moderate size, and the majority of the health boards are distributed quite close to each other in education level, while still displaying a range of prescription rates, detracting from the credibility of the suggested relationship.
It is clear there is variation in the variables, and possibly a relationship between them, but the geographical stratification is too large at the level of health boards. To try and get a clearer image of the relationship, let’s do the same analysis, but with Scottish data zones, focusing in on much smaller localities.
The original prescribing data already had GP practice codes, so we can map the prescriptions to data zones using a practice lookup data set from Public Health Scotland. After that we’ll do the same as we did to the prescribing data before, finding the total prescriptions per month and then averaging, just this time by data zone instead of health board, and then adding on the census data for education, again using the flexible table builder with data zones instead of health boards.
GP practices and list sizes October 2025 available from: https://www.opendata.nhs.scot/dataset/gp-practice-contact-details-and-list-sizes/resource/47557411-7eda-4278-9d6d-d26ed2ceab5a?inner_span=True
## load in practice details
practice_info <- read_csv("https://www.opendata.nhs.scot/dataset/f23655c3-6e23-4103-a511-a80d998adb90/resource/47557411-7eda-4278-9d6d-d26ed2ceab5a/download/practice_contact_details_20251001_opendata.csv") %>% clean_names()
antidepressants_practice <- antidepressants %>%
inner_join(practice_info, by = c("gp_practice" = "practice_code")) %>%
select(c(gp_practice, number_of_paid_items, month, data_zone)) %>%
group_by(data_zone, month) %>%
summarise(total_items = sum(number_of_paid_items, na.rm = TRUE), .groups = "drop") %>% # total prescriptions per month for each data zone
group_by(data_zone) %>%
summarise(mean_items = mean(total_items, na.rm = TRUE), .groups = "drop") # average monthly prescriptions from Jan-Aug 2025 for each data zone
raw_lines_dz <- read_lines(here("data/census_education_dz.csv"))
dz_rows <- grep('^"S\\d{8}"', raw_lines_dz) # use regex to find just the rows with the data we need
core_text_dz <- c(raw_lines_dz[10], raw_lines_dz[dz_rows]) # combine header row with the data
census_dz <- read_csv(
I(core_text_dz), # character vector fed in
show_col_types = FALSE) %>%
clean_names() %>%
rename(data_zone = highest_level_of_qualification) %>%
select(data_zone, degree_level_qualifications_or_above_education_qualifications_not_already_mentioned_including_foreign_qualifications, total) # only need these columns for investigation
antidepressants_education_dz <- inner_join(antidepressants_practice, census_dz)
Now, we can just add the antidepressants per capita and degree holder % to the data set and we can get a similar linear regression to what we had above, with data zones this time.
antidepressants_education_dz <- antidepressants_education_dz %>%
mutate(apc = mean_items/total, # antidepressants per capita
degree_pct = (degree_level_qualifications_or_above_education_qualifications_not_already_mentioned_including_foreign_qualifications/total)*100) # degree holder percentage for each data zone
graph3 <- ggplot(antidepressants_education_dz, aes(x = degree_pct, y = apc)) +
geom_point(size = 3) +
geom_smooth(method = "lm", se = TRUE) +
labs(
x = "Degree holders (%)",
y = "Antidepressant prescriptions per capita",
title = "Association between education level and antidepressant prescribing by Scottish Data Zone",
subtitle = "Prescribing data from January to August 2025"
)
ggplotly(graph3, tooltip = c("x", "y"))
This graph communicates a similar relationship as the data on health boards: that a higher educational attainment is negatively correlated with antidepressant prescribing. Again though, the data doesn’t confidently support the suggestion, with only a weakly negative slope. Most of the data zones are distributed around 0-3 antidepressants per capita, with a select few outliers near 10 and even one above 15. These few data zones with abnormally high prescribing rates are on the lower end in educational attainment, but this doesn’t reflect a cut and dry, certain relationship.
Despite the uncertainty of the visualisations produced, there are still valid conclusions to be drawn from this exploration.
It is clear that there are localities with significantly high antidepressant prescribing rates relative to the rest of the country, and it’s unlikely to be a coincidence that these are areas with a relatively low education attainment rate. It would be valuable to further investigate just these outlying data zones, to identify disparities in other characteristics of the area like deficient/harmful infrastructure or local policy, insufficient access to support schemes, or disadvantageous environmental factors among many others. These are all factors that would impact both education attainment and antidepressant use, as the two are inextricably linked and involved in constant, lifelong interactions with personal, communal, and systemic aspects of being.